Bibliometric Analysis of Global Circular RNA Research Trends from 2007 to 2018

Objective Circular RNA (circRNA) is of significant interest in genetic research. The aim of this study was to assess global trends in circRNA research production in order to shed new light on future research frontiers. Materials and Methods In this retrospective study, we conducted a literature search using the Web of Science Core Collection (WoSCC) database on March 21, 2019 to retrieve publications from 2007 to 2018. Excel 2013, CiteSpace V, and VOSviewer were used to evaluate bibliometric features that included publication output, countries/regions, institutions, journals, citation frequency, H-index, and research hotspots. Results Global cumulative publication output on circRNA consisted of 998 papers with a total citation of 28 595 during 2007-2018. China, the US, and Germany were the most prolific countries. China ranked first in H-index (60 times) and citations (13 333 times). The most productive institution was Nanjing Medical University with 73 papers. Biochemical and Biophysical Research Communications (impact factor [IF]2017:2.559) ranked first among journals in the number of publications (64 papers). The keywords shifted from "sequence", "intron", and "splice-site" to "transcriptome", "microRNA sponge", "exon circularization", and "circRNA biogenesis" overtime. The burst keywords "transcriptome", "microRNA sponge", "exon circularization", and "circRNA biogenesis" were the latest frontiers by 2018. Conclusion This is a relatively novel bibliometric analysis to inspect research related to circRNA. The results show that publications have continuously increased in the past decade. China, the US, and Germany were the leading countries/regions in terms of quantity. Recent studies on topics related to circRNA biogenesis and function should be closely followed in this field.


Introduction
The concept of "circular RNA (circRNA)" was proposed by Sanger et al. (1) when they reported that viroids are pathogenic to certain higher plants with single-stranded covalently closed circRNA molecules. circRNAs mostly stem from either exons (2,3) or introns (4,5). The covalently closed loop is characterized by neither 5´-3´ polarity nor a polyadenylated tail (6), and this distinguishes circRNAs from linear RNAs. Meanwhile, circRNAs are more stable, even when treated with RNase R (7). Researchers initially believed that circRNAs were by-products in the aberrant splicing process, and had little role in biological processes (2). With the rapid advances of high-throughput RNA sequencing (RNA-seq) and bioinformatics, numerous endogenous, diverse, widespread and conserved circRNAs have been identified (8)(9)(10). Therefore, these molecules caused a resurgence in interest by researchers. Of particular note, some studies have shown that circRNAs could act as microRNA (miRNA) sponges and regulate line RNA transcription and protein production to modulate gene expression (11)(12)(13).
Recent evidences indicated that circRNA plays a role in aging (9,14) and tissue development (15). circRNAs might be involved in neurological disorders (16), atherosclerotic vascular disease risk (17), Alzheimer's disease (18), and cancer (19). Thus, they might be potentially valuable in disease diagnosis, prognosis, and precise therapy (20,21). Simultaneously, database setups for circRNA in the last few years include circBase, CIRCpediav2, and CircInteractome (TableS1, See Supplementary Online Information at www.celljournal.org). These databases make it more convenient for researchers to access and study circRNA, and facilitates progress in this field.
Although researches related to circRNA have flourished in recent years, there have been limited attempts to systematically explore the development of scientific productivity in this area. To our knowledge, there are a few reports on research activity in circRNA that have been published internationally. The focus of bibliometrics is on literature systems and literature metrology characteristics; they statistically and mathematically analyse written publications such as books and periodicals (22). This is a reliable method to analyse literature in the field of science and characterize the tendency of research activity over time. Bibliometrics has contributed to research trends in cardiovascular diseases (23), gastrointestinal diseases (24), and diabetes (25).
The aims of present study were to systematically evaluate the international publication productivity of circRNA research using the Web of Science (WoS) from 2007 to 2018; analyse the most productive countries/ institutions/journals; and measure geographic and time distribution of literature that pertained to circRNA.

Patient and public involvement
In this retrospective study, no patient or public involvement was available.

Sources of data and the search strategy
We searched literature in the online version of Science Citation Index-Expanded (SCIE), Web of Science Core Collection (WoSCC), and Essential Science Indicator (ESI) databases on March 21, 2019. We downloaded the data from a public database as secondary data, which did not involve ethical considerations. Thus, ethical approval was not applicable in this situation.

Data collection
WoSCC was used to analyse the characteristics of the publications, such as annual publications, countries/ regions, institutions, journal sources, citation frequency, impact factor (IF), weighted IF (IF 2 ), H-index, etc. The H-index, citation frequency, IF, and IF 2 were used to qualitatively measure the scientific research performance. IFs were obtained based on the Journal Citation Reports (JCR) 2017 and IF 2 was calculated according to Rasim et al. (26).
The H-index, created by Hirsch (27) in 2005, can more perfectly reveal a country's or individual's achievement. This index takes both the quantity of published papers and the citation frequency into account, which means that H papers published by a researcher/institution/country received at least H citations. A higher H-index shows the larger influential power.
All data were gathered and verified by two authors independently (Ran Wu and Fei Guo). The data in "txt" form were downloaded from WoS and imported into Microsoft Excel 2013, CiteSpace V (64 bits), and VOSviewer (Version1.6.6, Leiden University, Leiden, The Netherlands).

Statistical analysis
A fitting mathematical model that used Microsoft Excel 2013 was employed to analyse the temporal tendency of the publications. The model: f(x)=ax 4 +bx 3 +cx 2 +dx+e was applied to model the cumulative number of publications and present a prediction of the future tendency of circRNA outputs. The symbol x represented the year, and f(x) represented the annual number of publications by year.
The world map of publication distribution was generated by GunnMap 2 (http://www.lert.co.nz/map/). GraphPad Prism version 6.01 (San Diego, CA, USA) was employed to analyse Pearson's correlation between publication number and gross domestic product (GDP) or the population number. P<0.05 were considered to be statistically significant. VOSviewer was used for the bibliometric analysis and visualization of the literature (28). In this study, it was used to analyse the collaboration between countries/regions and institutions. Network visualization of journals' citation analysis was also derived through VOSviewer. CiteSpace V was used to construct a knowledge map of journals and keywords, and to obtain burst keywords that had the strongest citation.

Distribution of countries/regions according to circular RNA
A total of 998 studies fulfilled the search criteria (Fig.1A, Fig.S1, See Supplementary Online Information at www. celljournal.org), of which the majority were articles (868, 87.0%), followed by reviews (130, 13%). Figure  1B shows the geographical distribution of publications by individual countries/regions. There were a total of 46 countries/regions. Table 1 lists the top 10 most productive countries/regions; China, with 729 publications ranked first, followed by the US (181), Germany (45), Denmark (23), and Canada (21). After adjustments for GDP and population, we noted that Demark had the most publications per GDP (0.071) and the most publications per million people (3.986). There was an excellent correlation between publication numbers and population (r=0.996, P<0.0001) (

Distribution of institutions that published research related to circular RNA
A total of 919 institutions published researches related to circRNA (Table S2, See Supplementary Online Information at www.celljournal.org). The most productive institution was Nanjing Medical University, which published a total of 73 papers. The Chinese Academy of Sciences and Fudan University tied for second with 41 papers. Publications from the top 10 institutions accounted for 34.47% of all literature on circRNA. Figure 1D shows the collaborations between institutions with at least five publications.

Publication outputs and growth prediction
The annual publication numbers and accumulated publications are presented in Figure 2A. The annual publications were stably low from 2007 to 2013, and remarkable growth was observed since 2014. In total, the publications related to circRNA consistently increased during the last decade.
As shown in Figure 2B, there was a significant correlation between the publication year and annual number of circRNA publications (R 2 =0.997). Worldwide, this was estimated to reach 955 publications in 2019.

Distribution of published journals and funding agencies that focused on circular RNA
The 998 publications on circRNA research appeared in 331 journals (  Figure 2C presents the dual-map overlay for the journals. The citing journal map is shown on the left and the cited journal map is displayed on the right. The disciplines covered by journals are marked in the label. Citation links that start from the journals on the left and end with those on the right are presented with lines. The map shows one main citation path, which indicates that most publications appeared in molecular, biology, and immunology journals. These publications were mostly cited from the molecular, biology, and genetics fields.  Figure 2D.

Citation and H-index analysis
Based on our analysis, the citation frequency number of all articles associated with circRNA was 28 595 by 2018. In terms of citations, China ranked first with 13 333 citations, followed by the US with 8460, Germany with 4798, Israel with 1615, and Denmark with 1344. The citation frequency per paper was 28.65 times, and Argentina had the highest frequency per paper (385), followed by Israel (179.44) and Germany (106.62) ( Table S5, See Supplementary Online Information at www.celljournal.org). Figure  3A shows the citations and H-index results of the top five productive countries/regions. China, with an H-index value of 60, ranked first.

Citations analysis was conducted within all 331 journals. Our results demonstrated that Molecular Cell
had the highest citation frequency (1908), followed by Nature (1519), and Scientific Reports (1464) (Fig.3B).

Hotspots of studies on circular RNA
The total citations of the top 10 most cited publications varied from 386 to 1519 (Table 3). The IF numbers of the listed papers ranged from 2.766 to 41.577. The article that achieved the most citations (1519 times) was published by Memczak et al. (8).
Keywords used in the 998 papers were analysed with CiteSpace V. Totally, we extracted 202 keywords with 648 links, which were defined as the top 50 of the most frequent items from each year with the title, abstract, and keywords field under the condition of the CiteSpace V default setting (Fig.  S3, See Supplementary Online Information at www. celljournal.org). The top 20 keywords with strongest citation bursts are shown in Figure 3C. According to the timeline, keywords shifted from "sequence", "intron", and "splice-site" to "transcriptome", "microRNA sponge", "exon circularization", and "circRNA biogenesis. The strongest ones included "exon circularization", "microRNA sponge", "mouse testi", "transcript", and "circRNA biogenesis".

C D A B
Wu et al.

Discussion
Researchers previously focused on RNA with protein coding functions derived from DNA. In-depth studies and advanced technology make it clear that there are abundant and widespread noncoding RNAs (ncRNAs), which include miRNA, lncRNA, and circRNA. These RNAs could play significant roles in the life process (11,21,22). circRNA is an ncRNA, which was believed to be a byproduct and have little function (2,3). However, recent advances have implied that circRNA might participate in both physiological and pathological processes (9,(14)(15)(16)(17)(18)(19). This study aimed to quantitatively and qualitatively evaluate the bibliometric characteristics of circRNA research, and to inspect the future research frontier. Publications, to some extent, could be considered a judgment of development within a certain research field.
Researches related to circRNA have rapidly developed. To the best of our knowledge, this bibliometric analysis is the first attempt in this field. According to the results, the publication year can be separated into two stages. The first stage (2007-2013) had a slow increase in publications and was the initial phase of circRNA research. The second stage (2014-2018) had a sharp growth trend and was the flourishing phase of cicrRNA research. The number of publications in last few years exceeded the accumulative numbers in the early stage. With rapid and substantial progress in this field, the whole world was expected to maintain publishing papers about circRNA in a productive way. According to the prediction curve, more literature will be published in the circRNA research field in the future.
China, the US, and Germany were the leading countries in quantity (total publication number). After standardizing for GDP and population, Denmark ranked first with 0.071 publications per GDP and 3.986 publications per million people. Although Demark ranked fourth with 23 publications, we believed that a highly developed economy and smaller population compared to China and the US placed Denmark first after standardization. GDP and population are relevant to the publication output (37). In the present study, we found no correlation between publication numbers and GDP; however, the population number showed a positive correlation with publication numbers. We employed citations, cited frequency per paper, and H-index to analyse the quality. Among the top five prolific countries/regions, China, with an absolute advantage in publication numbers, scored the highest in both citations and H-index. However, Germany received the largest number of cited frequencies per paper. In terms of collaboration network, far-ranging cooperations were identified worldwide. The strongest cooperation was found between China and the US. Meanwhile, China and the US also had extensive cooperation with other countries/regions, respectively. Generally speaking, international cooperation is a result of cooperation between institutions worldwide (38). However, we found that Chinese institutions tend to collaborate nationally. This may partly explain the large output by China.
Chinese institutions preceded the quantity on circRNA research. The most productive worldwide was Nanjing Medical University. We mentioned that national collaborations were widespread in China. There were over 10 links between the prolific institutions (e.g., Nanjing Medical University, Fudan University, and Shanghai Jiao Tong University) and other institutions. Cooperation facilitates the progress of circRNA research from this perspective. Another interesting finding was that the majority of funding agencies were from China in this field. If one researcher in China successfully applied for major funding, such as the National Natural Science Foundation, and published high-quality articles, he or she might have priority to receive more funding, which becomes a cycle. This could also explain the productivity in China. Keywords assigned in each article or review can make delineation of the topics involved in circRNA research. Burst keywords, which were captured by CiteSpace V in this study, could make a reasonable prediction of research frontiers over time (39). The blue and red lines indicated time intervals and periods of citation bursts, respectively. With advanced technology, the research fields of circRNA transferred from discovery to in-depth mechanism and function, which was in line with the objective law of i. Transcriptome: To date, circRNA that had been derived from pre-mRNA was primarily identified through high-throughput RNA-seq. It was not until the advanced RNA-seq detecting non-polyadenylated transcriptomes emerged that circRNA was found to be diverse and widespread (8,10,12,29). Thus, transcriptome analysis was of great significance for circRNA identification and research.
ii. miRNA sponge: miRNAs are regulatory RNAs derived from hairpin transcripts. The results of recent studies show that some circRNAs might regulate gene expression at multiple levels (6). Of note, the primary finding was that circRNA could function as a miRNA sponge in the cytoplasm. circRNA competed with mRNA for miRNA biding and then regulated gene expressions (29).
iii. Exon circularization and iv. circRNA biogenesis: The biogenesis of circRNA has been uncovered after in-depth study. For instance, circRNAs are transcribed by RNA polymerase II (30,40), and this biogenesis is regulated by the cis-regulatory elements and trans-acting factors that control splicing (6). Exon circularization is one of the necessary procedures of circRNA formation.
Although this is the first bibliometric study to comprehensively and objectively estimate global trends in circRNA research, there are some limitations. First, the total number of publications differs among the major databases -PubMed, Scopus, and Google Scholar. The use of the WoSCC database could have overlooked relevant publications from analysis. Second, the publications included in this analysis were restricted to the English language. Therefore, non-English papers, which are important, were excluded from the present study. Last but not least, all the searches were conducted over one day (March 21, 2019) to avoid bias; however, the database is constantly updating. Some high-quality publications are still being cited and this information may be omitted. Despite the aforementioned limitations, we believe that the overall results may not have changed.

Conclusion
This study firstly provides a bibliometric analysis on global trends of circRNA research during 2007-2018. Researches in this field have notably increased in recent years and will continue to emerge. Most studies associated with circRNA arose from China, the US, and Germany. China was the leading country with the highest H-index and citations. International cooperation was widely found throughout the world. The most prolific institution, Nanjing Medical University, was from China. Biochemical and Biophysical Research Communications had the most circRNA publications. "Transcriptome", "microRNA sponge", "exon circularization", and "circRNA biogenesis" might be the latest research frontiers that relate to the future for circRNA research.